Yes you can probably map them all to Pillow images and then cast to the Image feature:
from datasets import load_dataset
import requests
from PIL import Image
def to_pillow(examples):
urls = examples['Photo']
images = []
for url in urls:
image = Image.open(requests.get(url, stream=True).raw)
images.append(image)
examples['image'] = images
return examples
dataset = load_dataset("TheNoob3131/mosquito-data")
dataset = dataset.map(to_pillow, batched=True)
from datasets import Image
dataset = dataset.cast_column('image', Image)
I’m going to cc @mariosasko here (map
seems to very slow). Alternatively, you can do:
dataset.set_transform(to_pillow)
to do this on the fly.